[Linkpost] “Q2 AI Benchmark Results: Pros Maintain Clear Lead” by Benjamin Wilson 🔸, johnbash, Metaculus
Description
By Ben Wilson and John Bash from Metaculus
Main Takeaways
Top Findings
- Pro forecasters significantly outperform bots: Our team of 10 Metaculus Pro Forecasters demonstrated superior performance compared to the top-10 bot team, with strong statistical significance (p = 0.00001) based on a one-sided t-test on Peer scores.
- The bot team did not improve significantly in Q2 relative to the human Pro team: The bot team's head-to-head score against Pros was -11.3 in Q3 2024 (95% CI: [-21.8, -0.7]), then -8.9 in Q4 2024 (95% CI: [-18.8, 1]), then -17.7 in Q1 2025 (95% CI: [-28.3, -7.0]), and now -20.03 [-28.63, -11.41] with no clear trend emerging. (Reminder: a lower head-to-head score indicates worse relative accuracy. A score of 0 corresponds to equal accuracy.)
Other Takeaways
- This quarter's winning bot is open-source: Q2 Winner Panshul has very generously made his bot open-source. The bot writes separate “outside view” and “inside view” [...]
---
Outline:
(00:20 ) Main Takeaways
(03:24 ) Introduction
(04:30 ) Methodology
(13:59 ) How do LLMs Compare?
(17:18 ) Which Bot Strategy is Best?
(23:04 ) Are Bots Better than Human Pros?
(25:38 ) Binary vs Numeric vs Multiple Choice Questions
(27:07 ) Team Performance Over Quarters
(31:14 ) Bot Maker Survey
(31:40 ) Best practices of the best-performing bots
(38:27 ) Other Survey Results
(41:32 ) How did scaffolding do?
(45:33 ) Advice from Bot Makers
(53:48 ) Links to Code and Data
(54:56 ) Future AI Benchmarking Tournaments
---
First published:
October 28th, 2025
Linkpost URL:
https://www.metaculus.com/notebooks/40456/q2-ai-benchmark-results/
---
Narrated by TYPE III AUDIO.
---






















